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Executive Summary 

The concept of computerized adaptive testing (CAT) was developed to deal with the statistical aspects of 
ability testing. The test generally starts out with a question of approximately average difficulty. Based on a test 
taker’s response, subsequent items are chosen that are more appropriate for the ability level of the test taker. In 
applying CAT within a large-scale testing program, the selection of questions for administration to a test taker 
cannot be based solely on item difficulty, as described above. Constraints must be imposed on the selection of 
items that assure that every test taker receives a test that appropriately covers the domain of content the test 
proposes to measure. Issues such as blended reading load and the proper distribution of answer keys must also be 
addressed. The goal is that a CAT that incorporates these additional constraints should still provide the reduced test 
length and improved precision that is promised by this technology. 

In this paper, an adaptive testing procedure is proposed in which the content distribution, reading load, and 
answer key distribution of the test are controlled by explicit constraints imposed on the item selection process. The 
process begins by assembling a full test that meets the constraints and provides the best measurement for the initial 
ability estimate for the test taker. The item from this full test that best matches the ability level of the test taker is 
then chosen for administration to the test taker. After the first item is presented to the test taker and scored, the 
ability level of the test taker is updated. A full test is then reassembled that includes the item already administered, 
is appropriate for the updated ability estimate, and meets all of the constraints. The question that is most 
appropriate for the current ability estimate of the test taker is then selected from the newly assembled test. This 
process continues until a full test has been administered. This procedure assures that the full test will meet all of the 
necessary test assembly constraints and will be appropriate for the ability level of the test taker. 

A simulation study using a pool of 753 Law School Admission Test (LSAT) items was run to assess the 
practical feasibility of this procedure. Results indicated that the computer processing time needed to reassemble the 
test and select the next item was always between one and two seconds. Also, for realistic test lengths, the effect of 
imposing the set of constraints on the item selection process appeared to have no discernible effects on the 
statistical properties of the final ability estimate. 



Abstract 

A model for constrained computerized adaptive testing is proposed in which the information in the test at the 
ability estimate is maximized subject to a large variety of possible constraints on the contents of the test. At each 
item-selection step, a full test is first assembled to have maximum information at the current ability estimate fixing 
the items previously administered. Then the item with maximum information is selected from the test. All test 
assembly is optimal due to the use of a linear programming model that is automatically updated to allow for the 
attributes of the items already administered as well as the new value of the ability estimator. A simulation study 
using a pool of 753 items from the Law School Admission Test (LSAT) showed that for adaptive tests of realistic 
lengths the ability estimator did not suffer any loss of efficiency from the presence of 433 constraints on the item 
selection process. 



Introduction 

The concept of adapting the difficulty of the test to the ability of the individual examinee is as old as the first 
intelligence test (Binet & Simon, 1905). In the Binet-Simon test, the items varied according to age group and the 
examiner was instructed to infer the next age group from the responses of the examinee to the previous test items 
until the true age group could be identified with sufficient certainty. In doing so, Binet and Simon intuitively 
followed the statistical principle that the information provided by test items is maximal if their difficulty matches 
the level of ability of the examinee. 




The authors are indebted to Wim M. M. Tielen for writing the simulation program and to David A. Schweizer for adapting the 
CONSOL software. Address all correspondence to W. J. van der Linden, Department of Educational Measurement and Data 
Analysis, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands, e-mail: vanderlinden@edte.utwente.nl. 
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Since modem group-based testing was introduced, attempts have been made to implement this principle of 
adaptivity in a practical format. One of the first attempts was two-stage testing — a testing format in which the score 
on a routing test directs the examinee to one of a limited number of measurement tests. In the self-scoring flexilevel 
test, a testing format proposed by Lord (1980, capt. 8), the examinee scores his/her own responses by scratching an 
answer sheet and is instructed to move on to the next item as a function of the correctness of the response. In 
Weiss’ (1973) computerized stradaptive test, the items in the pool are divided into strata of difficulty and ordered 
according to their discrimination power within each stratum. The examinee moves to the next item in the higher 
stratum if his/her response is correct but to a lower stratum if it is incorrect. For a more extensive description of 
these early forms of adaptive testing, see Wainer (1990) or Weiss (1985). 

With the advent of powerful personal computers and the acceptance of item response theory (IRT) as a tool for 
calibrating item pools, large-scale application of fully computerized adaptive testing (CAT) has become possible. A 
well-known procedure in adaptive testing is maximum-information item selection in combination with maximum- 
likelihood estimation of ability. In this paper, it is assumed that the responses to the items in the pool fit the three- 
parameter logistic (3-PL) response model 



P, (0) m Prob{Ui = 1 } ■ Ci + ( 1 - Ci )[ 1 + exp(- ai (0 a - b,)) j\ (1 ) 

where 0 a e ( - 00 , 00 ) is a parameter for the ability of examinee a, and (- °°,°° ) and a,e [0,°o) are parameters 

for the difficulty and discrimination power of item i, respectively. For this model Fisher's information on 0 in item i 
can be shown to be equal to 



Ii(0) = a?Pi(0)Qj(0), (2) 

with Qj(0) = 1 - Pj(0) . The maximum-information principle selects the next item to have a maximum value for 

(2) at the current ability estimate. With a modem PC the time needed to calculate the maximum-likelihood ability 
estimate and select the item with maximum information from an item pool of realistic size is hardly noticeable by 
the examinee. 

Paradoxically, now that fully computerized CAT is technically possible the interest seems to be moving back to 
earlier forms of adaptive testing. The reason for this unexpected development lies in the fact that the original 
conception of CAT focuses entirely on the statistical aspects of item selection and ability estimation and ignores all 
other test specifications typically in use in testing programs. As a consequence, it may lead to testing programs that: 

1 . do not guarantee equal composition of tests across examinees, and hence loose their face validity; 

2. exclude the use of item pools with dependencies between the items, for example, between items that 
cannot be administered in the same test because one item contains a clue to the solution to another item or 
between items that have to be presented in sets because they are linked to a common stimulus; 

3. overexpose some items, with the potential danger that the items become known prematurely to the 
examinees; 

4. do not allow for the possibility of reviewing responses to earlier items — a feature some programs want to 
offer to their examinees. 

Several solutions to these problems have been proposed. Wainer and Kiely (1987) suggest adaptive testing 
from a pool of testlets rather than individual items, designing the testlets to ensure adequate content coverage in the 
individual tests. The same goal is addressed in the proposal by Kingsbury and Zara (1991) who suggest spiraling 
item selection along subsets of items in the pool defining relevant content dimensions. Adema (1990) and Luecht 
(1995) use optimization techniques to assemble a system of two-stage tests with each possible route meeting the 
same set of test specifications. Reese and Schnipke (1 996) combine the ideas of two-stage and testlet-based testing. 
A probabilistic mechanism to govern the exposure rates of items in CAT is presented in Sympson and Hetter 
(1 985). Stocking and Swanson (1 993) propose a heuristic for sequential item selection that treats the test 
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specifications as well as the goal of maximum information as “desirable properties” of the test and then 
compromises between them at each item-selection step. 

It is the purpose of the present paper to propose a new form of constrained CAT. The procedure starts with the 
on-line assembly of a full test that meets all of the specifications and has maximum information at an initial 
estimate of the ability of the examinee. The assembly of the test is optimal due to the use of a linear programming 
(LP) model of the test specifications. The first item to be administered is selected from this test according to the 
maximum-information principle. At each next step, the LP model is updated to allow for the values of the attributes 
of the items already administered, and the remaining part of the test is reassembled to have maximum information 
at the new ability estimate. The approach improves on conventional multistage or testlet-based adaptive testing 
designs in that there is no need to assemble fixed subtests or testlets in advance. All test assembly is online to 
ensure maximum information at the current ability estimate. At the same time, unlike conventional CAT, item 
selection automatically satisfies the test specifications. The idea to base CAT on a process of reassembling full tests 
was developed independently by Cordova (1996). The approach is an alternative to the sequential heuristic 
proposed by Stocking and Swanson (1993); it is more rigorously based on the ideas developed for the application 
of LP to optimal test assembly, does guarantee that all of the test specifications are met, and has the explicit 
objective of maximum information in the test. A discussion of the precise differences between existing approaches 
and the present approach to constrained adaptive testing is postponed until the latter has been presented in 
more detail. 

In the remaining part of the paper, constrained adaptive test assembly is first conceptualized as an adaptive 
solution to an LP model for test assembly. An example of a model is given and possible implementations are 
discussed. For two different implementations, the statistical properties of the ability estimator are compared in a 
simulation study using an existing item pool for the LSAT. 

General Model of Constrained Test Assembly 

The concept underlying the following sections is that the process of test assembly can be characterized as an 
instance of constrained optimization. Formally, each constrained optimization problem has: (1) an objective 
function defined on the decision variables of the problem that is maximized or minimized; and (2) a series of 
constraints on the possible values of the decision variables which together define a feasible solution to the problem. 
In test assembly, for example, the objective may be to match the test information function to a target and the 
constraints may require that prespecified numbers of items be selected from certain content categories. If the 
objective function and constraints are linear in the decision variables, the problem belongs to the domain of LP, 
which has a large body of algorithms and heuristics to solve its problems. A large variety of conventional test 
assembly problems have been shown to lend themselves to modeling as an LP problem with 0-1 decision 
variables. Some relevant references are: Adema (1992a, 1992b), Adema, Boekkooi-Timminga, and van der Linden 
(1991), Adema and van der Linden (1989), Armstrong and Jones (1992), Armstrong, Jones, and Wu (1992), 
Boekkooi-Timminga (1987, 1990), Theunissen (1985, 1986), Timminga and Adema (1995, 1996), van der Linden 
(1994; in press), van der Linden and Boekkooi-Timminga (1988), and van der Linden and Luecht (1996). 

An important distinction in test assembly is the one between constraints on categorical and quantitative 
attributes of test items. Categorical attributes introduce a partitioning of the item pool with different subsets of 
items corresponding to different levels of the attribute. Some examples of categorical attributes are: item content, 
cognitive level, item format, and gender orientation. A quantitative attribute is a parameter or coefficient with 
possibly different numerical values for each item. Examples of this type of attribute are: item p-value, expected 
response time, and item exposure rate. Constraints may also be needed to guarantee that items linked to the same 
stimulus are administered as sets. In addition, these stimuli themselves may involve constraints on categorical (e.g., 
content classification) or quantitative attributes (e.g., word count). 
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The problem of constrained CAT can now be represented as a series of updates of the following optimization 
problem: 

maximize information at current ability estimate (3) 

subject to possible constraint(s) on the 



length of the test; 


(4) 


number of item sets in the test; 


(5) 


number(s) of items per item set; 


(6) 


categorical item attributes; 


(7) 


quantitative item attributes; 


(8) 


dependencies between items in sets; 


(9) 


categorical item set attributes; 


(10) 


quantitative item set attributes. 


(ID 



In addition, a few technical constraints may be necessary to solve the optimization problem. The following section 
gives an example of an LP formulation of this verbally stated problem. 

Example 

To present the example, the following definitions are needed: The items in the pool are indexed by / = 7, I. 
In addition, the pool is assumed to consists of item sets, V Jt j = 1 , J , each of which may have a different number 
of items. For each item a decision variable x t is used which takes the value 1 if the item is included in the test and 
the value 0 otherwise. Likewise, a second decision variable Zj is used to decide whether (z 7 = 1 ) or not (z.j = 0 ) item 
set j is included in the test. In addition, the exemplary attributes in Table 1 are used. 

TABLE 1 



Exemplary item and item set attributes 



Attribute 


Value 


Cognitive level of item 


Reading Comprehension (Ci); 




Analytic Reasoning (C 2 ); 




Logical Reasoning (C 3 ) 


Expected response time for item 


rj€ (0,°o) 


Frequency of previous item usage 


fi€ {0,1, ...} 


Content of item set 


Humanities (Si); Social Sciences (S 2 ) 



The following example of the test assembly problem is given: 


1 * 

maximize E Ij(#)Xj 

i = I 


(maximum information at 0) 


subject to 




1 

EXj =n, 

i = l 


(test length) 


J 

E Zj =m, 
j=i 

£ Xj < nj U) z J5 j = L . 

i€ Vj 


(number of item sets) 


.., J, (number of items in item set j) 



erJc 



( 12 ) 



(13) 

(14) 



( 15 ) 



5 



X x i - rij^Zj , j - 1, — 5 J, 

16 Vi 


(idem) 


(16) 


X Xi < rih U) , h = 1, 2, 3, 

is a 


(number of items per cognitive level) 


(17) 


X Xj > nt”, h = 1,2,3, 

ieC h 


(idem) 


(18) 


X li Xi < r <u) 

i=l 


(response time available) 


(19) 


_x 

IA 

II 


(maximum item exposure) 


(20) 


X Zj < n‘ u) , g = 1, 2 

Jes, 


(number of item sets per content category) 


(21) 


X Zj > n ?, g - 1,2 

j^S, 


(idem) 


(22) 


X3I +X32 + X33 + X34 - 1 


(mutually exclusive items) 


(23) 


Z 8 +Z 9 +Z l0 < 1 


(mutually exclusive item sets) 


(24) 


Xi = 0, 1, i = 1 I, 


(domain of decision variables) 


(25) 


zj = 0, 1, j = 1, J. 


(idem) 


(26) 



The right-hand side coefficients in the constraints are bounds on numbers of items (ft) or item sets (m). Upper and 
lower bounds are denoted by a corresponding superscript. Note that some of the constraints are formulated using 
the decision variables for the items (*/) and others using the variables for the item sets {zj). The constraints in (15)- 
(16) have both types of variables to ensure that individual items in sets are chosen if and only if a sufficient number 
from their sets are chosen. It is evident that the model only has a solution if the numbers in the right-hand side 
coefficients are chosen consistently and the pool has enough items to satisfy these numbers. These conditions are 
assumed to be met in a deliberately designed CAT program. 

The model in (1 2>— (26) is equivalent to the maximin model for test assembly (van der Linden & Boekkooi- 
Timminga, 1989), with the exception that it does not maximize the information in the test proportionally at a 
number of 9 values but at an estimate of 9 for a single examinee. A review of the constraints available to model a 
large variety of test specifications is given in the same paper. 

Models for test assembly as in (12>— (26) can be solved for an optimal test (= set of values for the decision 
variables) using a standard software package for LP or a choice from the algorithms and heuristics offered in the 
test assembly package ConTEST (Timminga, van der Linden, & Schweizer, 1996). For test assembly models with 
the special structure of a network-flow problem, efficient algorithms are possible (Armstrong, Jones, & Wu, 1992). 
Typically, the use of each of these algorithms is preceded by some form of preprocessing of the model or the item 
pool; for example, a solution of a model with a constraint as in (19) is generally obtained quicker if all items 
with f > f (u) are first removed from the pool. 

The next section discusses how to implement models as in ( 1 2)— (26) in a CAT program. 
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Adaptive Implementation of the Model 

It is assumed that the test stops as soon as n items are administered; Other stopping rules are possible but this 
rule is believed to enhance the face validity of the test. Adaptive implementation of the model in (1 2)— (26) involves 
the on-line execution of the following steps for each examinee: 

Step 1: Initialize the model; 

Step 2: Assemble an initial test according to the model; 

Step 3: Administer the item with maximum information at the ability estimate; 

Step 4: Update the model; 

Step 5: Reassemble the remaining part of the test putting the items not administered back into the pool; 

Step 6: Repeat Steps 3-6 until n items have been administered. 

The algorithm is adaptive because of Step 4. The update of the model in this step involves both an update of 9 
in the objective function in (12) and an update to allow for the attributes of the item administered. The only thing 
needed to perform the latter is to insert a constraint into the model that sets the decision variable of this item equal 
to 1 . For example, if Item 22 is selected, the constraint X 22 — 1 is inserted. 

Note that when reassembling the remaining part of the test in Step 5, the items not yet administered are put 
back into the pool. Hence, the newly assembled part of the test is always at least as good as the old part but most 
likely better since the ability estimate has been updated. Also, if a feasible solution to the model exists for the 
initial test, the problem of reassembling later parts of the test remains feasible. 

In Table 2 the algorithm is illustrated for a 5-item test. The items in the upper triangle are the items already 
administered. The items in the lower triangle form the part of the test reassembled using the updated model 
(Step 5). The bold numbers in this triangle are the items selected according to the maximum-information principle. 
Note that bold numbers are moved to the upper triangle in the next column of the table. 

TABLE 2 



Example of a 5-item constrained^ adaptive test_ 



Selection of Item #1 


#2 


#3 


#4 


#5 


— 


39 


39 


39 


39 


13 


— 


14 


14 


14 


27 


8 


— 


41 


41 


28 


14 


22 


— 


22 


39 


41 


37 


22 


— 


41 


49 


41 


37 


6 



Note. Numbers in upper triangle are items already administered. Italic numbers in 
lower triangle are items in the reassembled part of the test. Bold numbers are items 
selected according to the maximum-information principle. 



Possible Initializations of the Model 

How the model should be initialized in Step 1 has not yet been explained. An obvious way to do so is to 
choose a plausible value for 9 based on knowledge of the ability distribution of the population of examinees and 
to choose the values for the bounds in the constraints on the basis of the test specifications. A more sophisticated 
initialization of 0 is to choose a value based on prior information on the values of relevant background variables 
for the examinee. A method for estimating 9 directly from background variables is presented in van der Linden 
(submitted). An alternative is to choose a prior value for 9 and administer a short CAT as a pretest, ignoring the 
constraints in the model. The suggestion is based on the observation that the presence of large numbers of 
constraints in the test assembly models may slow down the convergence of the ability estimator. Therefore, it may 
be advantageous to relax the algorithm first and impose the constraints on the item selection process when the 
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ability estimator has had some time to stabilize. Stabilization has been shown to be remarkably quick for a 
Bayesian alternative to the maximum-information principle of item selection known as the Maximum Predicted 
Posterior Expected Information Criterion (van der Linden, 1 996). If the constraints are introduced at a later 
moment in the test, the decision variables of the items already administered have to be fixed at 1. Of course, to 
keep the original model feasible, the pretest cannot be longer than the smallest upper bound in the right-hand sides 
of the constraints on item numbers in the model. 

Item Sets and Item Review 

The presence of items sets in the pool entails no special measures as long as the structure of the pool has been 
modeled correctly by constraints such as those in (1 4)— (1 5), (2 1 )— (22), and (24) in the exemplary model. If an item 
set is chosen, an optimal number of items in the set between the given bounds are also chosen. Normally, item 
sets are to be administered intact. If so, Steps 4 and 5 in the algorithm are postponed until the last item in the 
set has been administered. In the (unlikely) case that the items need not be administered as an intact set, the 
procedure can just be continued and the algorithm automatically selects the right number of items from the set 
at optimal moments. 

If the examinees are given the opportunity to review their responses within blocks of items, the only possible 
consequence is a revision of the ability estimate if some of the responses are changed. Thus, when moving to a next 
block, 6 may have to be revised but the set of constraints in the model need not be updated. 

Statistical Properties of the Ability Estimator 

To study the effect of constraints in the adaptive item selection process on the ability estimator for a realistic 
adaptive testing program, a simulation study was run using a pool of 753 items from the LSAT. The pool consisted 
of three different sections, which are labeled here as SA, SB, and IA. All items were calibrated using the 3-PL 
model given in (1). The length of the adaptive test was set equal to n = 50, with the following distribution of items 
across sections: SA: 12 items; SB: 14 items; and IA: 24 items. Large numbers of linear constraints were imposed 
on the item selection process to deal with the item-set structure of the pool as well as existing specifications with 
respect to item (sub)types, types of stimuli in item sets, gender and minority orientation of the stimuli, answer key 
distributions, and words counts. The numbers of decision variables and constraints in the model for the complete 
test as well as its three sections are given in Table 3. 

TABLE 3 

Numbers of items, item sets, decision variables , and constraints in the model 



Level 


Number 
of Items 


Number of 
Item Sets 


Number of 
Variables 


Number of 
Constraints 


Test 


753 


3 


804 


433 


SA 


208 


24 


232 


179 


SB 


240 


24 


264 


218 


IA 


305 


0 


305 


30 



The following three different conditions were simulated: 

1 . Constrained CAT, with sections in the order IA, SA, SB; 

2. Constrained CAT, with sections in the order SA, SB, IA; 

3. Unconstrained CAT. 
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Because Section IA was least severely constrained, a comparison between the results for the first two conditions 
shows the effect of imposing the majority of the constraints after the ability estimator is stabilized. The comparison 
between the first two and the last condition shows the effect of the 438 constraints on the ability estimator. 

Adaptive tests were simulated for 9 = -2.0, -1.5, ..., 2.0, and the procedure was replicated 100 times for each 9 
value. Ability was estimated using the EAP estimator with a uniform prior distribution. The initial ability estimate 
was set equal to 0. At each step the LP model was solved using the First Acceptable Integer Solution Algorithm 
(Adema, 1992b; Timminga, van der Linden, & Schweizer, 1996, sect. 6.6). This heuristic is based on the following 
adaptation of the branch-and-bound method. Let z LP be the value of the objective function in the solution to the 
relaxed model. This value is as an upper bound to the solution of the model with 0-1 variables. The branch-and- 
bound search is stopped as soon as the current solution is larger than h t z L p , with h/ < 1 but large enough to 
guarantee a satisfactory result. In addition, following Crowder, Johnson, and Padberg (1983), the optimal reduced 
costs in the relaxed solution are used to fix some of the nonbasic variables. Let dj be the costs associated with 
nonbasic variable Xj. Then, if Xj = 0 in the relaxed solution and z L p - h 2 z L p < dj, h 2 < 1, the variable is fixed to 0. 
Likewise, Xj is fixed to 1 if x, = 1 in this solution and z LP - h 2 z LP < - dj. For the LP models in the present example, 
the best setting found was h t = .90 and h 2 = .91. Parameter h 2 has to be set larger than h h but if it is set too high, 
overconstraining may occur. In manual test assembly, the heuristic is then rerun with a lower value for this 
parameter. In the current framework of adaptive testing, however, it was decided not to reassemble the test and to 
select the next item simply from the last test assembled. The effect of this measure, which was applied for 4.06% of 
all items selected in this study, is possibly less than optimal item selection and hence underestimation of the 
efficiency of the ability estimator. The results from the comparison between the mean-squared error (MSE) of the 
ability estimator in the constrained and unconstrained adaptive modes presented below are therefore expected to be 
slightly conservative with respect to the former. 

All runs were made on a PC with Pentium/1 33MHz processor. The CPU times needed to select an item in the 
constrained mode, that is, to update 9 , reassemble the test, and select an item with maximum information from it, 
were all within 1-2 seconds. These figures show that the approach proposed in this paper is practically feasible for 
item pools and test specifications such as those used in this example. 

The MSE functions of the EAP ability estimator after n = 10, 20, 30, 40, and 50 are presented in Figure 1. For 
n = 10, the functions for Condition 1 (constrained CAT, with order: IA, SA, SB) and Condition 3 (unconstrained 
CAT) show about equal results for all values of 6. The function for Condition 2 reveals relatively poor performance 
for the CAT version with a more severely constrained section at the beginning of the test. However, the effect is 
already small when 20 items are administered, and for more than 30 items the results for the three conditions are 
identical for all practical purposes. The bias functions in Figure 2 show the same pattern. Note that in both figures 
the results for the lower end of the 6 scale tend to be somewhat poorer than those for the upper end. This difference 
in performance is likely to be due to underrepresentation of some categories of items at the lower end of the scale 
in the item pool. 
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FIGURE 1. Estimated MSE functions of the EAP estimator after 10, 20, 30, 40, and 50 items ( solid line: 
unconstrained CAT ; dashed line: constrained CAT, in the order IA, SA, SB; dotted line: constrained CAT, in the 
order SA, SB, IA). 
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FIGURE 2. Estimated bias functions of the EAP estimator of ability after 10, 20, 30, 40, and 50 items (solid line: 
unconstrained CAT; dashed line: constrained CAT, in the order IA, SA, SB; dotted line: constrained CAT, in the 
order SA, SB, IA). 
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Discussion 

As already observed, other adaptive testing formats that can be used to deal with constraints on test contents 
are multi-stage and testlet-based adaptive testing. In multi-stage testing, the content of the test is adapted only at the 
end of previously determined stages. In addition, at each stage only a limited number of options are available each 
designed to be optimal for a previously selected ability level. In contrast, the present format adapts the content of 
the test to the updated ability estimate after each new item, selects the remaining part of the test from all options 
feasible for the item pool, and guarantees maximum information. Testlet-based adaptive testing offers more 
flexibility than multi-stage testing but in principle the same differences hold. In the Stocking and Swanson (1993) 
approach, all test specifications and the objective of maximal information are combined into a weighted objective 
function. Next, the items are selected from the pool to optimize this function in a sequential mode. Applying the 
approach to the empirical example in this paper, weights would have to be specified to reflect the desirability of 
each of the 433 constraints in the model. As a consequence of this complexity, unpredictable violations of the 
constraints as well as the principle of maximum information may occur. The approach in this paper, however, 
requires all constraints to be met. In addition, it is not based on sequential selection of single items, but at each step 
selects all remaining items simultaneously to have maximum information at the ability estimate. 
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